Implement top-k optimization #1960

acquamarin · 2023-08-25T21:12:48Z

This PR implements the optimization for top-k queries.
Sample top-k queries:
MATCH (comment:Comment) return comment.length,comment.creationDate ORDER BY comment.length, comment.creationDate LIMIT 5

Plan without top-k optimizatioin: We firstly accumulate and order by all tuples, then we perform a limit operation on the result.
Plan with top-k optimization:
We merge the order by and limit operator together.
Instead of accumulating all tuples, we do local sort and only keep the top-k tuples in each thread.

Performance number:
https://docs.google.com/spreadsheets/d/1K53Yz8KMuvrFfXbyoPyULhWt9nSBEUI6W4WdMWirODM/edit#gid=1649459535

codecov · 2023-08-25T21:35:23Z

Codecov Report

Patch coverage: 85.81% and project coverage change: -0.28% ⚠️

Comparison is base (ee029b9) 89.75% compared to head (06b4a2a) 89.48%.

❗ Current head 06b4a2a differs from pull request most recent head 27da653. Consider uploading reports for the commit 27da653 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1960      +/-   ##
==========================================
- Coverage   89.75%   89.48%   -0.28%     
==========================================
  Files         868      881      +13     
  Lines       32032    32348     +316     
==========================================
+ Hits        28750    28946     +196     
- Misses       3282     3402     +120

Files Changed	Coverage Δ
src/include/processor/operator/physical_operator.h	`100.00% <ø> (ø)`
src/include/processor/result/factorized_table.h	`96.55% <ø> (ø)`
src/processor/operator/physical_operator.cpp	`63.47% <0.00%> (-2.29%)`	⬇️
src/processor/processor.cpp	`100.00% <ø> (ø)`
src/processor/result/factorized_table.cpp	`95.68% <ø> (ø)`
src/processor/operator/order_by/top_k.cpp	`67.07% <67.07%> (ø)`
...ocessor/operator/order_by/order_by_key_encoder.cpp	`82.82% <84.61%> (-5.12%)`	⬇️
...ude/processor/operator/order_by/key_block_merger.h	`91.30% <100.00%> (+0.19%)`	⬆️
src/include/processor/operator/order_by/order_by.h	`100.00% <100.00%> (ø)`
...processor/operator/order_by/order_by_key_encoder.h	`100.00% <100.00%> (ø)`
... and 15 more

... and 57 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

andyfengHKU

Let's fix the coverage for this PR before merging

src/include/processor/operator/order_by/order_by_key_encoder.h

src/include/processor/operator/order_by/sort_state.h

src/include/processor/operator/order_by/key_block_merger.h

src/processor/operator/order_by/sort_state.cpp

src/processor/operator/order_by/order_by_scan.cpp

src/include/processor/operator/order_by/top_k.h

src/processor/operator/order_by/top_k.cpp

andyfengHKU · 2023-08-26T19:00:23Z

src/include/processor/operator/order_by/top_k.h

+using vector_select_comparison_func =
+    std::function<bool(common::ValueVector&, common::ValueVector&, common::SelectionVector&)>;
+
+struct TopKScanState {


Leave a TODO for me to move this into mapper

I don't think we can move the initialization to mapper.
We need have access to the sortedKeyBlock. If the sorting hasn't started, we don't have access to the sortedKeyBlock.

andyfengHKU approved these changes Aug 26, 2023

View reviewed changes

acquamarin force-pushed the top-k branch 2 times, most recently from bfc463b to a7089cf Compare August 27, 2023 14:21

Implement top-k optimization

27da653

acquamarin force-pushed the top-k branch from a7089cf to 27da653 Compare August 27, 2023 15:56

acquamarin merged commit 8174d25 into master Aug 27, 2023
10 checks passed

acquamarin deleted the top-k branch August 27, 2023 16:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement top-k optimization #1960

Implement top-k optimization #1960

acquamarin commented Aug 25, 2023 •

edited

Loading

codecov bot commented Aug 25, 2023 •

edited

Loading

andyfengHKU left a comment

andyfengHKU Aug 26, 2023

acquamarin Aug 26, 2023

Implement top-k optimization #1960

Implement top-k optimization #1960

Conversation

acquamarin commented Aug 25, 2023 • edited Loading

codecov bot commented Aug 25, 2023 • edited Loading

Codecov Report

andyfengHKU left a comment

Choose a reason for hiding this comment

andyfengHKU Aug 26, 2023

Choose a reason for hiding this comment

acquamarin Aug 26, 2023

Choose a reason for hiding this comment

acquamarin commented Aug 25, 2023 •

edited

Loading

codecov bot commented Aug 25, 2023 •

edited

Loading